אנליטיקה עסקית בארגונים ממוקדי לקוח



Similar documents
Database Marketing, Business Intelligence and Knowledge Discovery

BIG DATA & DATA SCIENCE

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

ANALYTICS CENTER LEARNING PROGRAM

Azure Machine Learning, SQL Data Mining and R

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Big Data and Analytics: Challenges and Opportunities

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Introduction to Data Mining

How to Enhance Traditional BI Architecture to Leverage Big Data

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

This Symposium brought to you by

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Big Data Integration: A Buyer's Guide

Reference Architecture, Requirements, Gaps, Roles

Data Warehousing and Data Mining in Business Applications

An Overview of Knowledge Discovery Database and Data mining Techniques

COMP9321 Web Application Engineering

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Sunnie Chung. Cleveland State University

Integrating a Big Data Platform into Government:

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Advanced Big Data Analytics with R and Hadoop

Big Data Explained. An introduction to Big Data Science.

The University of Jordan

The Scientific Data Mining Process

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Data Mining + Business Intelligence. Integration, Design and Implementation

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

PREDICTIVE MARKETING, DIGITAL ATTRIBUTION, OPTIMIZATION, AND DATA-DRIVEN PERSONALIZATION

Navigating Big Data business analytics

ANALYTICS STRATEGY: creating a roadmap for success

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Data Mining for Everyone

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

The Future of Data Management

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Journée Thématique Big Data 13/03/2015

Healthcare Measurement Analysis Using Data mining Techniques

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

birt Analytics data sheet Reduce the time from analysis to action

Big Data. Introducción. Santiago González

How the oil and gas industry can gain value from Big Data?

PRIME DIMENSIONS. Revealing insights. Shaping the future.

Adobe Insight, powered by Omniture

Big Data and Your Data Warehouse Philip Russom

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Data Warehouse Architecture Overview

<Insert Picture Here> Oracle Retail Data Model Overview

Data Virtualization and ETL. Denodo Technologies Architecture Brief

A Knowledge Management Framework Using Business Intelligence Solutions

Data Mining Techniques

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

How To Turn Big Data Into An Insight

Why big data? Lessons from a Decade+ Experiment in Big Data

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

BIG Data Analytics Move to Competitive Advantage

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

A Review of Data Mining Techniques

Foundations of Business Intelligence: Databases and Information Management

SPATIAL DATA CLASSIFICATION AND DATA MINING

ICT Perspectives on Big Data: Well Sorted Materials

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Big Data. Fast Forward. Putting data to productive use

IBM SPSS Modeler Professional

Big Data, Start Small! Dr. Frank Säuberlich, Director Advanced Analytics (Teradata International) 26 th May 2015

Introduction to Data Mining

Some Research Challenges for Big Data Analytics of Intelligent Security

Big Data Executive Survey

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Data Isn't Everything

An Introduction to Advanced Analytics and Data Mining

CS590D: Data Mining Chris Clifton

UNIFY YOUR (BIG) DATA

How To Make Sense Of Data With Altilia

Advanced In-Database Analytics

[callout: no organization can afford to deny itself the power of business intelligence ]

Big Data Can Drive the Business and IT to Evolve and Adapt

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Hadoop in the Hybrid Cloud

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Data Mining Analytics for Business Intelligence and Decision Support

Transcription:

Big Data vs. big Data? אנליטיקה עסקית בארגונים ממוקדי לקוח אילן ששון ד"ר sasson.elan@gmail.com www.datascience.co.il 30/3/2015

מטרות מהזה?Big Data Analytics מה זהScience?Data DDD וחשיבותובארגוניםמוטילקוחושירות מגווןמקורותנתונים מושגיםבסיסייםביצירת Data Products המלצותוגישותלהבניית יכולותאנליטיות מדענינתונים ולמהזה עשוילעניין...אתכם? מושגייסודבכרייתנתונים ניהולפרויקטיםמוטיאנליטיקהעסקית (CRIPS-DM) Data Privacy ולמהזהחשוב דוגמאותקוד Data/Text mining R

Trend of Google Searches of Big Data and Data science over time showing the popularity of the terms Data Science the connective tissue between big data processing technologiesand data-driven decisionmaking (DDD) (Provost & Fawcett, 2013)

Terminology Data-Driven Decision-Making (DDD) refers to the practice of basing decisions on the analysis of data, rather than purely on intuition. (Provost & Fawcett, 2013) Data Science is a set of fundamental principles that support the extraction of information and knowledge form data. It involves principles, processes, and techniquesfor understanding phenomena via the (automated) analysis of data. Big Data Technologies are used to process and handle big data, and include preprocessing prior to implementing data mining techniques. The new approach to Business Analytics

Why do we really care? DDDaffects firm performance the more data-driven a firm is the more productiveis with a 4%-6% increase and highly correlated with higher ROI, ROE, asset utilizationand market value. (Brynjolfsson et al. Strength in numbers: How does datadriven decision making affect firm performance, 2013 MIT). BD Technologies utilization correlates with significant additional productivity growth affects firm performance 3% increase in productivity than the average firm. (TambeP. Big data know-how and business value, 2012 NYU). CompetitiveAdvantage What can I now do that I couldn t do before, or do better than I could do before?

3 Principles of the new era of computing Datawill be the basis of competitive intelligence for any organization companies, government entities, cites and individuals Data in this new era notlimited resource Changing how we make decision -Decisions will be based not on intuition or past experience, but on predictive analytics. Changing how we create value - Organizations - private and public - will become social enterprises. Changing how we deliver value -Success will depend upon the ability to create products and services for individuals -not market segments. http://asmarterplanet.com/blog/2013/03/ibm-ceo-ginni-rometty-gaining-competitive-advantage-in-the-newera-of-computing.html

Big Data Every Where! Lots of data is being collected and warehoused Transactional data Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network Multi media content Scientific data Networks sensors Mobile phones User generated content Internet of Things Data is becoming the new currency - vital natural resource Datafication -taking all aspectsof life and turning them into data (The rise of big data, 2013. Foreign Affairs)

What to do with these data? Aggregation and Statistics: Data warehouse OLAP Indexing, Searching, and Querying: Keyword based search Pattern matching (RDF/XML) Knowledge discovery: Data mining Text mining Graph mining Statistical modeling Big Data Big Assumptions Collecting and using a lot of data rather than small samples ( N= All ) Accepting messiness in your data Giving up on knowing the causes

Big Data Use Cases Big Data can play a significant economic role to Private commerce Public sector National economies

big Data The enterprise perspective Enterprise data is big but it is not Google-big OLTP ETL OLAP IT-Oriented Classic BI Boundary Dash-bored OLTP / Dark data/ Log / Social/ web ETL Business-Oriented Big Data Warehouse Augmented DWH + Extreme-Scale- Analytics

הפתרון הקיים DWH- מה הם סוגי המוצרים הבנקאיים הנמכרים ביותר? מה היא התפלגות הכנסות על פי מוצרים בנקאיים? מה היא התפלגות ההוצאות על פי יחידות מטה? רווחיות על פי מוצרים על פני מימד הזמן ומימד הסניפים? באילו סוגי מוצרים קיימת מגמת עונתיות?

מרחב הבעיה? מי הם הלקוחות הפוטנציאליים ביותר להלוואה מעל 300,000 ש"ח? איך ניתן לקצר את תהליך הטיפול במתן אשראי ללקוח חדש? מה הם המאפיינים של לקוח נוטש? מה הם המאפיינים של לקוח רווחי? אילו מוצרים חדשים מומלץ להציע ללקוחות קיימים? כיצד ניתן לייעל תהליכים בארגון?

מרחב הפתרון from Business Intelligence to Business Analytics from DHW/OLAP to Large Scale Data/Text Mining Verification Based Analysis ~ ~ ~ ~ Discovery Based Analysis ~ ~ ~ ~ ~ תהליך אנליטי מבוסס גילוי כלים/אלגורתימים הפועלים על מרחב הנתונים חושפים תבניות חבויות תהליכיהקבצה, ניבויואסוציאציה Unsupervised Learning Machine תהליכים המחייבים בסיס נתונים היסטורי גדול תהליך אנליטי מבוסס אימות משתמש מניח היפותיזה כלשהיא מופעלותטכניקותאישוש/סתירה תהליכיםמבוססימשתמש היכולת להניחהנחותנכונות,בחירת הכלים,ופרשנותהתוצאות תהליכים משלימים

Big Data Architecture & Pipeline פנימי/חיצוני מקורנתונים Streams Real Time Analytics Network/Sensor Internet of Things Video/Audio Entity Analytics Information Ingestion Unified Information Access (UIA) Master Data Data Integration Stream Processing Exploration, Analytics Discovery Predictive Operative Descriptive Prescriptive Landing Area Zone & Archive Raw Data Structured Data Unstructured Data Text Analytics Data Mining Machine Learning Complex Event Processing Intelligence Analysis Decision Management BI & Predictive Analytics Reporting & Discovery Business Processes

Data-Analytic Thinking One of the most critical aspects of data science is the support of data-analytic thinking throughout the organization Data-oriented business environment Basic understanding of basic principles In order to assess and envision opportunities accurately (data-analytics projects) Professional advantage in being able to interact competently (dataanalytics team) Business units must interact with data science team (domain knowledge) Data science project require close interaction with business people responsible for decision making

Conveying the message. Data miningis movingfrom the research arena into the pragmatic world of business There is continuouseffort of refining algorithms and coming up with new ones Now with new developments in algorithms and architecture smallscale development teams can build large-scale projects Practicaldata mining weighs the trade-offs between the most advanced and accurate model with the costs and complexity in realworld business environment New analytics tools and platforms make data mining much more easier and powerful for people at all levels of expertise Hadoop-based computing ecosystem is evolving rapidly, making project with very large-scale datasets much more affordable

The Ladder Approach Build a foundation Learn to think analytically(data mining models, visualization, statistics etc.) Develop a strategy and road map based on business needs (pick a theme) C-level management engagement (presentation) Adopt a step-by-step process (problem definition results: CRISP-DM) Pick and learn a tool (R, Python etc.) Practice on small datasets Build a portfolio Deliverable POCs and pilot projects (3-5) Quick-wins Practice on small datasets Write-up findings (storytelling) Deliver solutions Adopt technology infrastructure (HDFS, MapReduce, NoSQL Spark SQL. etc.) Ongoing revisions of models (data products) Continue to apply advanced analytics Business Scope & Deliverables

Rethinking the Business & IT Model Data Management & Business Analytics are Core Business Competencies o o o The Business Owns the Data Recognize Analytics as a Business Driven and Owned Process Technology is an Enabler Shift to Business Configurable and Controlled o o Acknowledge the Differencebetween Software Development and Business Analytics Redefinethe IT Support Model to Enable The Business to Acquire, Assess, Analyze, Test, and Deploy Analytical Outcomes Change the IT funding & Financial Model o o Current Infrastructure Model is Geared towards Legacy & Transactional Platform Recognize Analytics as a Business Driven and Owned Process Technology is an Enabler מקור השקף: מצגת MetLife כנסביגדאטהIBM אוקטובר 2013

Big Data Adoption התוויתתוכניתעבודה בחינתתרחיש (אחדאויותרלמימוש) קורסMining Big Data, Data Science & Data בן 10 מפגשיםקורס Data- Business בן 8 מפגשיםלאנשי Analytic Thinking הקמתקבוצת «360» R&D Team Infrastructure & Operations Business Unit Analysts Business IT Support Team

The Data Journey מהעושיםכיוםבארגון :.1 OLAP דוחותמימדימוצרשיווקתמחור.2 מודליםשלכרייתנתונים?... Internal Data מידע תפעולי קיים במחסן הנתונים New Internal Data (Dark Data) מידע קיים שלא מוגדר במחסן הנתונים, מידעמובנה, מיילים, מידע טקסטואלי (סוכנים, שמאים..) New External Data הערכה: 80% מהמידע בארגון אינו מובנה ואינו ממודל ולפיכך אינו זמין לניתוח ואנליזה בכלים הקיימים והמסורתיים מידעממקורותחיצוניים: אינטרנט, מתחרים,רשתותחברתיות, מידע סלולארימבוססמיקום, טלמטיקה סנסוריםועוד Data Management before Business Analytics בשלב ראשון לא נרתיח את האוקיינוס... Big Data doesn t have to be big it can be managed and built incrementally. Big Data may or may not include social media (eventually it will). Big Data may or may not include external data (eventually it will). Sometimes information is good enough.

Data Products Motivation: turning data assets products and services A data product is an algorithm, software, application, presentation or reproducible report based on data analytics A data product is the production output from a statistical analysis, data mining, text miming, AI etc. Initially online companies: search algorithms (Google) similar offerings (Amazon) recommendations for people you may know (Facebook) A data product is a product that facilitates an end goal through the use of data. DJ Patil Developing and launching data products, particularly if you are an offline business it won t be second nature... Data-as-a-Service (DaaS) - a cloud strategy used to facilitate the accessibility of businesscritical data in a well-timed, protected and affordable manner B2B "renting" data service

The Model Assembly Line Do you own the data? Business model Do you have the data? Data quality? Type of analysis Do you havethe data? Do you ownthe data? (legal issues, consider anonymized personal data) Is it high-quality and useful data? Do you have a business model? (bundling, selling, free) What types of analysisare you offering? (descriptive analytics vs. predictive analytics) Do you have differentiationor competitive advantage? (proprietary vs. commodity data) Competi tive adv.

The Model Assembly Line: A case study of DaaS Cellular companies Do you own the data? Business model Do you have the data? Data quality? Type of analysis Competi tive adv. מידע מיקומי based) (Location מרכזי מידע מיקומי מפתחי אפליקציות חברתיות ערים חלוקהגיאוגרפית - מרכזהעיר, איזוריקניות, איזוריבילוי, מרכזיעסקים תדירותעדכוןהנתונים - יומי/שבועי/חודשי נגישותלנתונים - Online/Batch סיווגלקוח עסקי, פרטי סוג תקשורת - SMS Voice, based) (Location עורקי תחבורה ראשיים Pricing Models עיריות ומוסדות תכנון ממשלתיים Volume based model Quantity-based pricing (amount) Pay-per-call (PPCall) Data type based model based on the type or attribute of data Subscription based model an unlimited amount of data חלוקהגיאוגרפית - עיר,פרבר סוגכביש - מהירביןעירוני, עירוני, אוטוסטרדה תדירותעדכוןהנתונים - יומי/שבועי/חודשי נגישותלנתונים- Online/Batch

Implementations Approaches The Full Service Approach:Relying on a 3rd party to develop and maintain the model The Full Control Approach: In house model development and deployment The Consultant Approach: Hybrid methodology

Implementations Approaches The Full Service Approach:Relying on a 3rd party to develop and maintain the model The Full Control Approach: In house model development and deployment The Consultant Approach: Hybrid methodology o Pros: o the ideal solution for companies who are resource constrained o the ideal solution for companies lacking technical and analytics staff o the model development can rely on expertise provided by the vendor o the quickest path to implementation o Cons: o reliance on the vendor to provide a solution without any independent review o not being able to make changes to the model directly o Internal staff is not trained to ensure attainment of desired results

Implementations Approaches The Full Service Approach: Relying on a 3rd party to develop and maintain the model The Full Control Approach:In house model development and deployment The Consultant Approach: Hybrid methodology o Pros: o the ideal solution for companies with analyticsand IT resources o Helps to protect IP in case of a novel idea or product o This approach offers the most flexibility in making revisions or customizations to the model o Cons: o The firm can t take advantage of any data or expertise accumulated by vendors and consultants o If a fundamental modeling error has been made, it may never be discovered o historically the slowest path to deployment, with successful implementations measured in years(?)

Implementations Approaches The Full Service Approach: Relying on a 3rd party to develop and maintain the model The Full Control Approach: In house model development and deployment The Consultant Approach: Hybrid methodology o Pros: Build your own core competencies coupled o the ideal solution for companies lacking depth in their analytics department, but who have available resources in systems and IT o There is a built-in independent review phase in this approach. o Companies are able to make changes directly to the model as needed with high-end data science consultancy o Cons: o If companies lack internal technical or analytical resources, they may be at the mercy of the vendor in the future should a model update or revision be needed. o Some companies attempt to update vendor models, but lack the in-depth knowledge of modeling techniques used. As a result, they may inadvertently make fundamental modeling errors o Continuous management attention

Roles in Data Science Data Scientist Applied statistician X computer scientist Computer science Math Statistics Machine learning Domain expertise Communication and presentation skills Data visualization No one person can be the perfect data scientists A team.? Data Scientist (noun): better at statistics than any software engineer and better at software engineering than any other statistician Josh Wills shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data (McKinsey, 2011)

Data Scientist Skills required to exploit big data Skills to work with business stakeholders to understand the business issue and context Analytical and decision modeling skills for discovering relationships within data and proposing patterns Data management skills are required to build the relevant dataset used for the analysis. Broad combination of soft and technical skills Sample of Program Offerings DB -Databases BI Business Intelligence, Data Warehousing ST Advanced-Level Statistics BA Business Analytics, Web Analytics DM Data Mining, Machine Learning, Text Mining, Natural-Language Processing BD Big Data Technologies, Visualization KM Knowledge Management, Social-Web Analysis קוסמולוגים של היקום הדיגיטלי http://online.wsj.com/article/sb10001424127887323478304578332850293360468.html?mod=itp

Building Models Introduction A model captures the knowledge exhibited by the data and encodes it in some language no model can perfectlyrepresent the real world Automatic or semi-automatic extraction of Interesting Non-trivial Implicit Previously unknown Potentially useful Forecasting what may happen in the future Classifying items into groups by recognizing patterns Clustering items into groups based on their attributes Associating what events are likely to occur together Sequencingwhat events are likely to lead to later events

Building Models Introduction Models fall into the categories of data mining: descriptive and predictive Predictive Tasks Use some variables to predictunknown or future values of other variables Descriptive Tasks Find human-interpretable patterns that describe the data Supervised learning Unsupervised learning Meta learning (ensemble learners) 31

Types of Data Mining Tasks Many business problems have as an important component one of these DM tasks: Affinity grouping (a.k.a. associations, market-basket analysis ) What items are commonly purchased together? Similarity Matching What other companies are like our best small business customers? Description/Profiling What does normal behavior look like? Clustering Do my customers form natural groups? Unsupervised Predictive Modeling (including causal modeling & link prediction) Will customer X churn next month/default on her loan? How much would prospect X spend? Who might be good friends on our social networking site? Supervised 32

Data Mining vs. Deployment

Merging Traditional & Big Data approaches

Merging Traditional & Agile approaches Time to market slow process Disconcert between the business people (consumers) and IT people (producers) The overall cost is high Breaking down the walls Discovery process and not a traditional SW development project Business owns the data

Codification of The Process Extracting useful knowledge from data to solve business problems can be treated systematicallyby following a processwith reasonably well-defined stages CRISP-DM- The Cross Industry Process for Data Mining - (www.crisp-dm.org) (CRISP-DM; Shearer, 2000) Structured process with critical points: Human Intuition High-powered analytical tools A well-understood processthat places a structure on a problem which still involves art science + craft + creativity + common sense 36

CRISP-DM The point of actuallyusing your results This process diagram makes explicit the fact that iteration is the rule rather than the exception not a linear process Preparatory activity what data? where is the data? accuracy and reliability of the data Both mathematical and logical The most substantial components (65%) timeconsuming and laborintensive 37

CRISP-DM Business Understanding A creative problem formulation -what is the problem? Think carefully about the use scenario and the actual business need What exactly do we want to do? How exactly would we do it? What parts of this use scenario constitute possible data mining models? Data Understanding It is important to understand the strengths and limitations of the data. Historical data often are collected for purposes unrelatedto the current business problem. Estimating the costsand benefits of each data source Data having varying degrees of reliability Cost of acquiring the data Data manipulation Data quality 38

CRISP-DM Data Preparation Pre-processing tasks Data conversions Data transformations (e.g., normalization, scaling etc.) Missing values, Outliers Redundant or non-informative features (i.e., feature selection, between-predictors correlations) Dimensionality reduction techniques (e.g., PCA, SVD) Modeling The primary place where data mining techniques are applied to the data It is important to have some understanding of the fundamental ideas of data mining, including the sorts of techniques algorithms and tuning parameters. Evaluation The evaluation stage is to assess the data mining results rigorously and to gain confidence that they are valid and reliable before moving on. Measuring models performance and generalization 39

Basic Principles - Privacy Collection limitation -Data should be obtained lawfully and fairly, while some very sensitive data should not be held at all. Data quality - Data should be relevant to the stated purposes, accurate, complete, and up-to-date; proper precautions should be taken to ensure this accuracy. Purpose specification -The purposes for which data will be used should be identified, and the data should be destroyed if it no longer serves the given purpose. Use limitation -Use of data for purposes other than specified is forbidden. Source: the OECD (Organization for Economic Co-operation and Development (OECD), 1980).

41 Data Science Course אפליקציות ושימושים של Big Data הצגת מגוון מודלים לכריית נתונים Predictive and Descriptive Analyticsו- Exploratory Data Analysis הכוללים בין היתר: Cluster Analysis Association Analysis Decision Trees & Random Forest Support Vector Machine Neural Networks Anomaly Detection Graph mining,social Network Analysis והצגת מושגי יסוד כדוגמת: Degree & Degree Distribution Centrality, Betweeness, Closeness Centralization ועוד שיטות לכריית נתונים טקסטואליים מבוססות NLP לצורך Text Categorization Information Extraction הצגת מושגי יסוד Information Retrieval ושיטות של ייצוג נתונים טקסטואליים מבוססי Bag-Of-Words שימוש בסביבת R לצורך תחקור סטטיסטי, כרייה והצגה של נתונים גישות ויזואליזציה וגרפיקה לאפליקציות מבוססות ניתוח נתונים טקסטואלי ) graph co-occurrences network, neighborhood ועוד) טכנולוגיות מתקדמות לניהול נתונים וארכיטקטורות אחסון ועיבוד הצגת מודל CRISP-DM לניהול פרויקטי אנליטיקה עסקית

Why R? R is a free and open source language and environment for statistical computing and graphics. R is already the most popular amongst the leading software for statistical analysis. Key features: It s a mature & widely used NYT Excellent graphics capabilities http://www.sr.bham.ac.uk/~ajrs/r/r-gallery.html Highly extensible, with over 4300 user-contributed packages It s easy to use and has excellent online help and associated documentation http://cran.r-project.org/other-docs.html -Manuals, tutorials, etc. provided by users of R

ביג דאטההוא ייצוג של תהליך בעל מגמות אבולוציוניות: מורכבות גיוון והתמחות תודה על ההקשבה sasson.elan@gmail.com